Skip to content

gh-73936: Add hashlib.saslprep (RFC 4013) and use it in smtplib for Unicode credentials#148692

Open
blink1073 wants to merge 3 commits intopython:mainfrom
blink1073:unicode-passwords-smtplib
Open

gh-73936: Add hashlib.saslprep (RFC 4013) and use it in smtplib for Unicode credentials#148692
blink1073 wants to merge 3 commits intopython:mainfrom
blink1073:unicode-passwords-smtplib

Conversation

@blink1073
Copy link
Copy Markdown

@blink1073 blink1073 commented Apr 17, 2026

This is a continuation of #103611, updated to address all open reviewer comments.

Changes

  • Lib/_saslprep.py (new): RFC 4013 SASLprep implementation, adapted from
    the PyMongo project with
    permission. Apache 2.0 licence header retained; MongoDB corporate CLA is
    signed.
  • Lib/hashlib.py: exposes saslprep as hashlib.saslprep (public API),
    per the suggestion from @gpshead in the original review.
  • Lib/smtplib.py: applies hashlib.saslprep() in auth_plain(),
    auth_login(), and auth_cram_md5(); switches auth() encoding from
    'ascii' to 'utf-8'.
  • Lib/test/test_saslprep.py (new): tests for RFC 4013 examples,
    character mapping, prohibited characters, bidirectional checks, unassigned
    code points, and test cases from the MongoDB JS saslprep library.
  • Lib/test/test_smtplib.py: adds Unicode credentials to the simulated
    server (Devanagari username/password; a password that SASLprep normalises via
    NFKC) and exercises all three auth mechanisms with them, including verifying
    that SASLprep-equivalent passwords authenticate successfully.
  • Python/stdlib_module_names.h: registers _saslprep.
  • Doc/library/hashlib.rst: new "String preparation" section documenting
    hashlib.saslprep().

📚 Documentation preview 📚: https://cpython-previews--148692.org.readthedocs.build/

blink1073 and others added 2 commits April 17, 2026 11:02
…e credentials

Co-authored-by: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
Co-authored-by: Bernie Hackett <bernie.hackett@gmail.com>
@blink1073 blink1073 requested review from a team, gpshead and picnixz as code owners April 17, 2026 16:05
@python-cla-bot
Copy link
Copy Markdown

python-cla-bot bot commented Apr 17, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

@blink1073
Copy link
Copy Markdown
Author

Kicking to re-run the CLA bot

@blink1073 blink1073 closed this Apr 17, 2026
@blink1073 blink1073 reopened this Apr 17, 2026
@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 19, 2026

I am not really fond of having it in hashlib. It has nothing to do with hashing and cryptographic primitives. Unfortunately having a new module requires a PEP usually. I also do not think it is right to change smtplib in the same PR.

I am on mobile so hard to check, but what RFC are we guaranteeing to follow for SMTP?

@warsaw
Copy link
Copy Markdown
Member

warsaw commented Apr 19, 2026

what RFC are we guaranteeing to follow for SMTP?

This is a good question. Supporting all the various RFCs related to email may just be too much for the stdlib, depending on whether we have a dedicated maintainer for it. We split out server-side support into aiosmtpd in Python 3.12. It might be time for client-side as well, but OTOH, it is handy to have basic client-side SMTP support in the stdlib.

I'm not really involved in any of this any more, so I don't really have a say.

Aside: I still think it makes sense to have a standards-compliant email parsing library in the stdlib.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 20, 2026

I would rather prefer the following:

  • if the current design does not easily allow to override the current support to allow UTF8, make it so (e.g. by adding an encoding parametrr instead of hardcoding "ascii" everywhere).
  • if extending the current interface is easy this should be left to a 3rd party module.
  • hashlib should stay out of it

@blink1073
Copy link
Copy Markdown
Author

The motivating issue asked to support unicode passwords in smtp. #103611 proposed using saslprep.py from MongoDB to accomplish the task.

I added it to hashlib based on this comment from @gpshead.

It could be that there is a simpler approach to adding unicode password support using rfc6531.

My main intent was to help carry the work from #103611 across the finish line, since I am a maintainer of the saslprep.py code.

@bitdancer
Copy link
Copy Markdown
Member

Well, I'm more or less once again the maintainer for smtplib after a long absence, though my primary focus is the email library. Right now I'm mostly focused there, rewriting the header parser to deal with a security-adjacent performance issue, so my head is currently loaded with the email RFCs, not the SMTP RFCs :)

I don't think RFC 6531 helps here; at least, a search for 'pass' does not get any hits. I think it is focused on the data, not on the auth mechanisms, but it has been a while since I scanned it, much less read it.

The issue you picked up from mentions saslprep being required by the RFC. It would be interesting to have a direct reference for that.

I think my comments on the original issue are relevant here: if we change smtplib so that you can at least pass binary passwords, then the user can at least implement what they personally need, even if smtplib doesn't directly support saslprep.

I think, since it is a standard and would directly enhance smtplib, imaplib, and poplib (if I understand correctly), supporting saslprep in the stdlib would be nice, and the contribution of the code is great, but we do probably want some sort of maintenance commitment as well? And then there is the question of where to put it, since it would be shared by all the email client code. So unfortunately a PEP might be needed; I'm not entirely clear on current procedures, hopefully Barry can speak to that. I think there is other auth code shared between those modules (or that could be shared) that could go in a common location, but I'd have to refresh my memory on those modules before I could say for sure.

So I see two orthogonal tasks here, and we could decide we should do either or both: we can get the possibility of non-ascii passwords working by allowing bytes passwords, and we can implement RFC compliant support for non-ascii unicode passwords, which is the much bigger job that this PR is focused on. I don't think there's currently any PR for the first one :)

Oh, another thought on location for saslprep: in an ideal world we might reorganize the stdlib so that poplib, imaplib, and smtplib were all under the 'email' package. In that case saslprep would also go there. So maybe we just wink and put it there? Maybe we could avoid having to do a PEP that way? Or should we do one anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants